Journal: BMJ Health & Care Informatics
Article Title: Finding a constrained number of predictor phenotypes for multiple outcome prediction
doi: 10.1136/bmjhci-2024-101227
Figure Lengend Snippet: Task 1–3 results. The test set AUROC performance of the different models (constrained LR, constrained GBM, most-comprehensive-case LR, simplest-case LR and existing models) across the databases for the prediction tasks with existing models. The red lines are the 95% CIs. The Optum EHR and Optum CDM databases are shaded since these databases were not used to learn the predictor phenotypes. AUROC, area under the receiver operating characteristic curve; Optum CDM, Optum’s de-identified Clinformatics Data Mart Database; Optum EHR, Optum de-identified Electronic Health Record. GBM: gradient boosting machines; LR: logistic regression.
Article Snippet: Two held-out databases (Optum de-identified Electronic Health Record data set (Optum EHR) and Optum’s de-identified Clinformatics Data Mart Database (Optum CDM)) were only used to develop the models using the constrained predictors.
Techniques: